As someone interested in business and finance reporting, I wanted to use a dataset that alligned with such interests. Taking into account the current pandemic we are living through, I thought it would be interesting to examine the relief given to businesses through the Small Business Administration’s Paycheck Protection Program (PPP) loans. The dataset I used can be located here: link. The dataset consists of 200,000 entries of PPP loan approvals in the state of Illinois from early April to early August. The dataset includes information on city, zipcode, loan amount, loan approval date, jobs reported, gender, race/ethnicity, lender, non-profit status, veteran status, business type, and NAISC code. As the dataset was incredibly large, I decided focusing on specifically Chicago would allow for more concentrated insight and interview material. The following code shows the manipulation I did to reduce the dataset into just Chicago. As not included data was filled as “Unanswered,” I needed to not only refine the dataset but also put it in a form that would recognize “Unanswered” as NA data.
#read in large IL dataset
PPP_Data <- read_csv("PPP Data up to 150k 080820 IL.csv")
#filtering into just the city Chicago
PPP_chicago <- PPP_Data %>%
filter(City == c("Chicago", "CHICAGO")) # %>%
# replace_with_na_all(condition = ~.x == "Unanswered")
# commented this out for sake of loading my code because the unanswered changes are v tedious
After looking at some graphs in Tableau, I realized that some that were labeled Chicago in fact were not in the Chicago area as the zip placed them in other parts of Illinois. Thus, I decided to instead filter by zipcode. In viewing this data in R, I realized that there was some error on the part of the SBA as two observations had congressional districts that were not in IL. I decided to remove these two observations. This dataset has 49,611 observations which was a little less intimidating than the original amount.
PPP_Chicago <- PPP_Data %>%
filter(Zip %in% c(60007, 60018, 60068, 60106, 60131, 60176, 60601, 60602, 60603, 60604, 60605, 60606, 60607, 60608, 60609, 60610, 60611, 60612, 60613, 60614, 60615, 60616, 60617, 60618, 60619, 60620, 60621, 60622, 60623, 60624, 60625, 60626, 60628, 60629, 60630, 60631, 60632, 60633, 60634, 60636, 60637, 60638, 60639, 60640, 60641, 60642, 60643, 60644, 60645, 60646, 60647, 60649, 60651, 60652, 60653, 60654, 60655, 60656, 60657, 60659, 60660, 60661, 60706, 60707, 60714, 60804, 60827)) %>%
filter(CD != "IA-02", CD != "OH-15") %>%
replace_with_na_all(condition = ~.x == "Unanswered")
I then realized that the NAICS code, which is a government distinction between different types of businesses, while interesting had 881 unique codes in the dataset. Thus, I decided to trim down the NAICS code into the two-digit industry code of which there are only 24. I used an inner join to attach the industry names to the dataset.
PPP_Chicago$NAICSCat <- strtrim(PPP_Chicago$NAICSCode, 2)
NAISC <- read_excel("NAISC.xlsx", col_types = c("text",
"text"))
PPP_Chicago <- PPP_Chicago %>%
inner_join(NAISC, by = "NAICSCat")
In order to analyze this in Tableau, I needed to write out a text file.
write.table(PPP_Chicago, file = "PPP_Chicago.txt", sep = "\t", row.names = FALSE)
I had a few guiding questions that I wanted to think about going in: (1) Are certain groups of individuals likely to get more money from the Paycheck Protection Program? (2) Are certain lenders more prevalent in certain areas? Which banks handled the most money? Did more local lenders seem to prefer to lend to certain demographics or areas? How did this differ from the US population? (3) How did the type of business affect the money recieved? ## Some Preliminary Charts The first thing I was curious about was some general information on broad numbers dealing with PPP loans.
# total dollar amount of loans
sum(PPP_Chicago$LoanAmount)
## [1] 1397429694
# total amount of loans approved
nrow(PPP_Chicago)
## [1] 46472
# average loan granted
mean(PPP_Chicago$LoanAmount)
## [1] 30070.36
# median loan granted
median(PPP_Chicago$LoanAmount)
## [1] 20000
# a box plot of the loan data, the vast amount of outliers makes it difficult to see the stopping point of the right whisker, tells us that average will be somewhat skewed to the higher numbers
ggplot(PPP_Chicago, aes(x = LoanAmount)) +
geom_boxplot(width = .5) +
ylim(-1,1) +
theme(plot.background = element_blank(),
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.x = element_line(color = "lightgrey"),
axis.line.x = element_line(color = "black"))
Tableau makes line graphs much easier so I was curious about a few observations about loans over time. (1) How many loans were being approved over time (2) How much was being given out in loans over time. The shapes tell us that no particular day seemed to have a greater amount of higher loans than others as the shapes are generally similar.
I wanted to do some preliminary analysis into some distributions (numerically) of different variables to try and answer my first question. The ones I thought would be very interesting, gender and race/ethnicity, unfortunately caused a bit of trouble. Some intial tables showed me that most observations did not include such data. Only 5% of observation included race/ethnicity data and 12% included gender. Thus, while I can use other data and compare to demographic data of Chicago in general (i.e. what are average loan amounts by zip code in comparison to the zipcodes average income or in comparison to the Black and Brown population proportions). This is something I considered as I decided how to build my final visualizations.
kable(PPP_Chicago %>%
count(RaceEthnicity))
| RaceEthnicity | n |
|---|---|
| American Indian or Alaska Native | 11 |
| Asian | 513 |
| Black or African American | 268 |
| Hispanic | 340 |
| Puerto Rican | 1 |
| White | 1396 |
| NA | 43943 |
kable(PPP_Chicago %>%
count(Gender))
| Gender | n |
|---|---|
| Female Owned | 1473 |
| Male Owned | 4682 |
| NA | 40317 |
To start on this final visualization, I made a map on Tableau that shows the average loan amount by zip code.
Caption
After some brief looks into those criteria, I began to look at some of the information that was more complete: lender, jobs reported, business type, and industry codes.
# making table with percentages of loans lenders were in control of (for comparison to national patterns) as well as counts, for sake of space I decided to only put lenders who loaned to more than 10 businesses
x <- PPP_Chicago %>%
count(Lender)
kable(x %>%
filter(n > 10) %>%
mutate(pct = n/nrow(PPP_Chicago)*100) %>%
arrange(desc(pct)))
| Lender | n | pct |
|---|---|---|
| Cross River Bank | 6624 | 14.2537442 |
| Kabbage, Inc. | 5683 | 12.2288690 |
| JPMorgan Chase Bank, National Association | 5479 | 11.7898950 |
| Celtic Bank Corporation | 2637 | 5.6743846 |
| Bank of America, National Association | 2539 | 5.4635049 |
| Fifth Third Bank | 2195 | 4.7232742 |
| U.S. Bank, National Association | 2004 | 4.3122741 |
| WebBank | 1392 | 2.9953520 |
| BMO Harris Bank National Association | 1317 | 2.8339645 |
| Byline Bank | 1075 | 2.3132209 |
| First Midwest Bank | 890 | 1.9151317 |
| Wintrust Bank, National Association | 778 | 1.6741264 |
| PNC Bank, National Association | 757 | 1.6289379 |
| Citibank, N.A. | 614 | 1.3212257 |
| TCF National Bank | 600 | 1.2911000 |
| MBE Capital Partners | 497 | 1.0694612 |
| Lakeside Bank | 440 | 0.9468067 |
| International Bank of Chicago | 401 | 0.8628852 |
| Customers Bank | 397 | 0.8542778 |
| CIBC Bank USA | 354 | 0.7617490 |
| First American Bank | 322 | 0.6928903 |
| The Huntington National Bank | 303 | 0.6520055 |
| First Secure Bank and Trust Co. | 253 | 0.5444138 |
| Readycap Lending, LLC | 248 | 0.5336547 |
| Radius Bank | 229 | 0.4927698 |
| Beverly Bank & Trust Company, National Association | 228 | 0.4906180 |
| Intuit Financing Inc. | 227 | 0.4884662 |
| Signature Bank | 212 | 0.4561887 |
| Fundbox, Inc. | 211 | 0.4540368 |
| Park Ridge Community Bank | 211 | 0.4540368 |
| Providence Bank & Trust | 202 | 0.4346703 |
| Village Bank and Trust, National Association | 196 | 0.4217593 |
| Bank of Hope | 190 | 0.4088483 |
| Parkway Bank and Trust Company | 190 | 0.4088483 |
| Republic Bank of Chicago | 190 | 0.4088483 |
| Self-Help FCU | 181 | 0.3894818 |
| Associated Bank, National Association | 180 | 0.3873300 |
| Union National Bank | 157 | 0.3378378 |
| CRF Small Business Loan Company, LLC | 154 | 0.3313823 |
| Belmont Bank & Trust Company | 136 | 0.2926493 |
| Millennium Bank | 131 | 0.2818902 |
| Marquette Bank | 124 | 0.2668273 |
| First Eagle Bank | 120 | 0.2582200 |
| The Northern Trust Company | 111 | 0.2388535 |
| West Suburban Bank | 107 | 0.2302462 |
| Town Center Bank | 106 | 0.2280943 |
| Hinsdale Bank & Trust Company, National Association | 103 | 0.2216388 |
| Itasca Bank & Trust Co. | 103 | 0.2216388 |
| Cornerstone National Bank & Trust Company | 99 | 0.2130315 |
| Lake Forest Bank & Trust Company, National Association | 82 | 0.1764503 |
| Wells Fargo Bank, National Association | 80 | 0.1721467 |
| Newtek Small Business Finance, Inc. | 78 | 0.1678430 |
| American Express National Bank | 77 | 0.1656912 |
| Albany Bank and Trust Company, National Association | 76 | 0.1635393 |
| Selfreliance FCU | 75 | 0.1613875 |
| Busey Bank | 72 | 0.1549320 |
| First Secure Community Bank | 72 | 0.1549320 |
| Northbrook Bank & Trust Company, National Association | 72 | 0.1549320 |
| Northeast Bank | 72 | 0.1549320 |
| BankFinancial, National Association | 71 | 0.1527802 |
| First Home Bank | 68 | 0.1463247 |
| Glenview State Bank | 68 | 0.1463247 |
| Royal Savings Bank | 68 | 0.1463247 |
| Old Second National Bank | 66 | 0.1420210 |
| Devon Bank | 63 | 0.1355655 |
| The First National Bank of Ottawa | 62 | 0.1334137 |
| Inland Bank and Trust | 61 | 0.1312618 |
| Heartland Bank and Trust Company | 60 | 0.1291100 |
| Burling Bank | 58 | 0.1248063 |
| Gold Coast Bank | 58 | 0.1248063 |
| Libertyville Bank & Trust Company, National Association | 58 | 0.1248063 |
| Schaumburg Bank & Trust Company, National Association | 56 | 0.1205027 |
| Harvest Small Business Finance, LLC | 52 | 0.1118953 |
| Midwest Community Bank | 51 | 0.1097435 |
| Forest Park National Bank and Trust Company | 48 | 0.1032880 |
| Hanmi Bank | 48 | 0.1032880 |
| Waterman State Bank | 47 | 0.1011362 |
| Itria Ventures LLC | 46 | 0.0989843 |
| Live Oak Banking Company | 46 | 0.0989843 |
| First Bank of Highland Park | 45 | 0.0968325 |
| Peoples Bank | 43 | 0.0925288 |
| CDC Small Business Finance Corporation | 42 | 0.0903770 |
| American Metro Bank | 41 | 0.0882252 |
| FC Marketplace, LLC (dba Funding Circle) | 40 | 0.0860733 |
| Grand Ridge National Bank | 40 | 0.0860733 |
| BSD Capital, LLC dba Lendistry | 35 | 0.0753142 |
| CNB Bank and Trust, National Association | 35 | 0.0753142 |
| Old Plank Trail Community Bank, National Association | 35 | 0.0753142 |
| American Community Bank & Trust | 34 | 0.0731623 |
| First Bank of the Lake | 34 | 0.0731623 |
| Silicon Valley Bank | 34 | 0.0731623 |
| Evergreen Bank Group | 33 | 0.0710105 |
| First Bank and Trust Company of Illinois | 32 | 0.0688587 |
| Centier Bank | 29 | 0.0624032 |
| Oxford Bank & Trust | 29 | 0.0624032 |
| Wheaton Bank & Trust Company, National Association | 29 | 0.0624032 |
| CenTrust Bank, National Association | 28 | 0.0602513 |
| Great Lakes CU | 28 | 0.0602513 |
| Midland States Bank | 28 | 0.0602513 |
| St. Charles Bank & Trust Company, National Association | 28 | 0.0602513 |
| Capital One, National Association | 27 | 0.0580995 |
| Fountainhead SBF LLC | 27 | 0.0580995 |
| North Side Federal Savings & Loan Association of Chicago | 26 | 0.0559477 |
| Greater Nevada CU | 24 | 0.0516440 |
| Merchants and Manufacturers Bank | 24 | 0.0516440 |
| Pacific City Bank | 23 | 0.0494922 |
| 1st Equity Bank | 22 | 0.0473403 |
| Allied First Bank | 22 | 0.0473403 |
| Community Bank of Elmhurst | 22 | 0.0473403 |
| Citizens Bank, National Association | 21 | 0.0451885 |
| First Bank of Manhattan | 21 | 0.0451885 |
| The Leaders Bank | 21 | 0.0451885 |
| Transportation Alliance Bank, Inc. d/b/a TAB Bank, Inc. | 21 | 0.0451885 |
| Brickyard Bank | 20 | 0.0430367 |
| Cache Valley Bank | 20 | 0.0430367 |
| Citizens Bank of Chatsworth | 20 | 0.0430367 |
| Crystal Lake Bank & Trust Company, National Association | 20 | 0.0430367 |
| United Fidelity Bank, FSB | 20 | 0.0430367 |
| American Eagle Bank | 19 | 0.0408848 |
| FNBC Bank and Trust | 19 | 0.0408848 |
| Algonquin State Bank | 18 | 0.0387330 |
| Bank of the West | 18 | 0.0387330 |
| Barrington Bank & Trust Company, National Association | 18 | 0.0387330 |
| Citizens Bank | 18 | 0.0387330 |
| La Salle State Bank | 18 | 0.0387330 |
| Legacy Bank | 18 | 0.0387330 |
| Notre Dame FCU | 18 | 0.0387330 |
| Bank of Pontiac | 17 | 0.0365812 |
| Idaho First Bank | 17 | 0.0365812 |
| The Bancorp Bank | 17 | 0.0365812 |
| American Community Bank of Indiana | 15 | 0.0322775 |
| First Merchants Bank | 15 | 0.0322775 |
| Navy FCU | 15 | 0.0322775 |
| Amalgamated Bank of Chicago | 14 | 0.0301257 |
| Cathay Bank | 14 | 0.0301257 |
| Hometown National Bank | 14 | 0.0301257 |
| BankUnited, National Association | 13 | 0.0279738 |
| Blue Ridge Bank, National Association | 13 | 0.0279738 |
| Hiawatha National Bank | 13 | 0.0279738 |
| Horizon Bank | 13 | 0.0279738 |
| McHenry Savings Bank | 13 | 0.0279738 |
| NorthSide Community Bank | 13 | 0.0279738 |
| The State Bank | 13 | 0.0279738 |
| Baxter CU | 12 | 0.0258220 |
| Carrollton Bank | 12 | 0.0258220 |
| First National Bank of Omaha | 12 | 0.0258220 |
| First Republic Bank | 12 | 0.0258220 |
| Stearns Bank National Association | 12 | 0.0258220 |
| United Community Bank | 12 | 0.0258220 |
| Northwest FCU | 11 | 0.0236702 |
| Princeville State Bank | 11 | 0.0236702 |
I wanted to explore lenders more. I found some data from how nationally loans were dispersed by lender. Though some of the big actors were also big in Chicago, there were a few notable differences. For instance, the second largest number of loans came from Kabbage, Inc. in Chicago, while it did not break the top 15 nationally. Additionally, it appears that some of the larger lenders on the national list had even more authority in the Chicago area. While JPMorgan Chase loaned 4.4% of loans nationally, they handled almost 12% of all PPP loans from this first round in the Chicago area. Some of the banks with bigger shares of total authority that did not appear on the national list appear to be Chicago/Midwest based: Byline Bank, First Midwest Bank, TCF National Bank (more midwest base), Lakeside Bank, and International Bank of Chicago. I was interested to see if these banks seemed to lend to certain areas more than the other larger banks. Businesses in the north side seemed to have access to more lenders than those on the south side, in general.
In comparison, here is a map of how many loans were dispersed throughout Chicago:
What was very intersting was that though there were a large amount of loans dispersed 60620, 60619, and 60628, there was less diversity in lenders. This led me to another question: who was dominating lending in these areas?
I was interested in seeing how this broke up by the Chicago/Midwest based big lenders versus the big 15 that nationally had a lot of the total loan authority. I realized after looking at the visualizations that one of the reasons why these maps might look so similar was because the number of lenders in the smaller percentage areas had more lenders that probably diluted the authority of these two groups I looked at.
I decided to go back to my original question of how lenders by individual lender vary by zip code. I needed to do some maneauvering to obtain a dataset that gave me for each zipcode the lender with the most loans. I thought this might provide some insight that the previous two visualizations couldn’t offer.
y <- PPP_Chicago %>%
group_by(Zip) %>%
count(Lender)
y <- y %>%
group_by(Zip) %>%
mutate(BigLend = max(n)) %>%
filter(n == BigLend)
write.table(y, file = "MaxLender.txt", sep = "\t", row.names = FALSE)
This graphic was so interesting to me, and I knew I wanted to use it in my story. The divide between the North and South is fairly stark as the lenders seem segregated to certain areas. I decided I wanted to look at this in my story and do some research into the marketing tactics of specifically Kabbage, INc. and Cross River Bank. I also wanted to see if they had any programs that were focused on these areas.
With this in mind, I shifted my attention to business type.
# making a table looking at the specifics of business types
kable(PPP_Chicago %>%
group_by(BusinessType) %>%
summarise(avgloan = mean(LoanAmount),
totalloans = sum(LoanAmount),
medloan = median(LoanAmount),
avgsize = mean(JobsReported, na.rm = TRUE)) %>%
arrange(desc(totalloans)))
| BusinessType | avgloan | totalloans | medloan | avgsize |
|---|---|---|---|---|
| Corporation | 38859.99 | 501915653.0 | 25200.00 | 6.100379 |
| Limited Liability Company(LLC) | 38615.10 | 303939472.3 | 22959.00 | 5.183796 |
| Subchapter S Corporation | 36817.15 | 234414770.5 | 22940.00 | 4.851329 |
| Sole Proprietorship | 16750.05 | 168137018.7 | 20000.00 | 2.394313 |
| Self-Employed Individuals | 15395.19 | 70232855.2 | 18750.00 | 1.258774 |
| Non-Profit Organization | 45098.47 | 50600482.9 | 33763.75 | 7.036630 |
| Independent Contractors | 12066.85 | 34414669.9 | 12500.00 | 1.173048 |
| Partnership | 40225.87 | 9855337.6 | 29358.77 | 4.447257 |
| Limited Liability Partnership | 51303.60 | 8721612.8 | 42888.50 | 7.503106 |
| Professional Association | 38493.65 | 6736389.4 | 24988.00 | 6.294118 |
| NA | 55263.47 | 4476341.0 | 47400.00 | 6.312500 |
| Cooperative | 53391.43 | 3523834.1 | 40250.00 | 5.906250 |
| Non-Profit Childcare Center | 54370.20 | 271851.0 | 52220.00 | 7.600000 |
| Joint Venture | 94702.62 | 189405.2 | 94702.62 | 10.000000 |
The chart was particularly interesting to me as the business type seemed to make a big difference to how much in loans one was able to get. I created the following visualization to try and gain some insight. I first created a box plot that showed each business type and the loan distributions. I was aware that size of the company might be a factor among why certain groups (sole proprieters, individual contractors, and self-employed people) may have lower loan averages/medians. Thus, I decided to look at some regression lines on scatterplots of job size and loan amount, grouped by business type. I removed any extreme outliers from the data to avoid any values that would vastly change the regression. I also removed to categories that had very few points: non-profit childcare center and joint venture.
The regression trend lines are much flatter for independent contractors, sole proprietors, and self-employed individuals. Thus, even when additional jobs were added in those business types, the impact of another person on payroll increased the average loan amount recieved by much less than corporations and LLCs. I was interested in this question particularly: what made these groups less likely to recieve more money. In my story, I explore this using visualizations as well as outside reporting context through Lotika Pai, managing director at the Women’s Business Development Center Access to Capital group.
Looking at business type made me start to think about how such differences look at an industry level.
# making a table looking at the specifics of industries
kable(PPP_Chicago %>%
group_by(Industry) %>%
summarise(avgloan = mean(LoanAmount),
totalloans = sum(LoanAmount),
medloan = median(LoanAmount),
avgsize = mean(JobsReported, na.rm = TRUE)) %>%
arrange(desc(totalloans)))
| Industry | avgloan | totalloans | medloan | avgsize |
|---|---|---|---|---|
| Professional, Scientific, and Technical Services | 32591.16 | 229409183.4 | 20805.00 | 2.851852 |
| Other Services (except Public Administration) | 26176.60 | 159101399.2 | 19500.00 | 3.945164 |
| Health Care and Social Assistance | 37053.64 | 150326602.6 | 21670.20 | 4.990464 |
| Accommodation and Food Services | 40881.86 | 150240845.9 | 25535.00 | 9.580398 |
| Retail Trade | 26617.13 | 107586454.0 | 19253.00 | 4.612694 |
| Transportation and Warehousing | 17645.84 | 80500307.6 | 14090.09 | 2.185086 |
| Construction | 29392.66 | 76891209.7 | 19226.00 | 3.289346 |
| Real Estate Rental and Leasing | 29552.85 | 76187241.4 | 20332.00 | 3.257446 |
| Manufacturing | 46417.04 | 73153253.2 | 30637.50 | 5.694133 |
| Finance and Insurance | 26345.81 | 72108487.7 | 20000.00 | 2.159023 |
| Wholesale Trade | 42320.83 | 58106499.5 | 24410.50 | 4.847432 |
| Administrative and Support and Waste Management and Remediation Services | 28277.99 | 51918391.9 | 19137.00 | 4.392576 |
| Arts, Entertainment, and Recreation | 22546.42 | 32399208.6 | 16582.00 | 4.932692 |
| Information | 34496.39 | 26665709.1 | 20508.00 | 3.332875 |
| Educational Services | 27889.66 | 26327838.9 | 19499.50 | 5.014706 |
| Agriculture, Forestry, Fishing and Hunting | 20512.29 | 17168787.5 | 20000.00 | 6.666667 |
| Public Administration | 33942.78 | 4039190.9 | 20833.00 | 3.267241 |
| Management of Companies and Enterprises | 26430.88 | 3964632.5 | 20000.00 | 5.586466 |
| Mining | 36108.51 | 686061.8 | 20833.00 | 4.578947 |
| Utilities | 27016.18 | 648388.3 | 18929.50 | 3.000000 |
What was interesting about this table was that professional, scientific, and technical services had the largest amount of loans (in dollars) given to them but had a lower loan average than a lot of other industries. Had I had more time for reporting I would be interested to see exactly what each classification means/consists of in Chicago (perhaps talking to some local small business focused resource centers). I decided to make a similar box plot to the business types box plot to again look at the distribution.
The transportatio nand warehousing industry seemed to have the lowest median loan amount while still having a vast array of outliers. The industries were also something I wanted to explore in my story. I made to bar charts whose information I wanted to combine into one:
Looking at the two, the information is valuable in each. However, layering the average loans over the total will be difficult because of how much higher the sum of all loans is in comparison to the individual. As a result, I was deciding how to display such information and realized that perhaps average loan amount and count gives a lot of valuable information. In addition, having a median line to show median as there are so many outliers may also be helpful.
I decided that a separate graph that perhaps shows the makeup of all the PPP loans would perhaps fit better. Since some portions are incredible small or were not given, I decided to create an other category that encompassed those categories: null, management of companies, mining, public administration, and utilities. Though pie charts make it difficult to detect small differences, I wanted some way to show the composition of PPP loans by industry as a part of the whole.
Because we were encouraged to host the final piece on a website, I decided to use ArcGIS’s Storymap software to host both my story and to enable me to add interactivity to my visualization. A lot of the most interesting data I found was through looking at maps thus I decided to use the ArcGIS interface to create interactive maps alongside my narrative.
The first challenge was seeing how to combine existing data from ArcGIS online into my dataset as unlike Tableau, ArcGIS requires shapefiles in order to draw zipcode boundaries. I played around, attempting to merge layers which proved unsuccessful but through use of a join features command, I was able to add my data to existing shapefile data.
Using this technique, I was able to add my information I wanted. However, because the dataset was quite large, I had to create individual datasets that included the map data I was interested in.
PPP_condensed <- PPP_Chicago %>%
group_by(Zip) %>%
summarise(avgloan = mean(LoanAmount),
totalloans = sum(LoanAmount),
medloan = median(LoanAmount),
avgsize = mean(JobsReported, na.rm = TRUE),
lenders = n_distinct(Lender),
loans = n())
write.table(PPP_condensed, file = "PPP-small.txt", sep = "\t", row.names = FALSE)